You are viewing the RapidMiner Studio documentation for version 10.2 - Check here for latest version
Get Pages (Web Mining)
Synopsis
Gets pages from URLs in an attribute and stores them into a new attribute.Description
This operator retrieves pages, whose URLs are contained in the input data set. For each row in the data set, the URL is extracted from the specified attribute. A GET request is sent and a page is acquired. This page is stored in a new attribute specified by the parameter page attribute.
Input
- Example Set (Data Table)
The Example Set port.
Output
- Example Set (Data Table)
The Example Set port.
Parameters
- link_attributeThe attribute that contains the URLs. Range:
- page_attributeThe name of the attribute that should contain the pages. Range:
- random_user_agentChoose a user agent randomly from a set of 7000 user agents Range:
- user_agentThe user agent property. Range:
- connection_timeoutThe timeout (in ms) for the connection. Range:
- read_timeoutThe timeout (in ms) for reading from the URL. Range:
- follow_redirectsSpecifies, whether redirects should be followed. Range:
- accept_cookiesSpecifies, whether cookies should be accepted. Range:
- cookie_scopeSpecifies the scope of the cookies used Range:
- request_methodSpecifies the request method. Range:
- delaySpecifies whether execution should not be delayed, delayed by a fixed or random amount of time. Range:
- delay_amountThe delay amount in ms. Range:
- min_delay_amountThe minimum delay amount in ms. Range:
- max_delay_amountThe maximum delay amount in ms. Range: